Goto

Collaborating Authors

 confidential data


Time-uniform and Asymptotic Confidence Sequence of Quantile under Local Differential Privacy

Neural Information Processing Systems

In this paper, we develop a novel algorithm for constructing time-uniform, asymptotic confidence sequences for quantiles under local differential privacy (LDP). The procedure combines dynamically chained parallel stochastic gradient descent (P-SGD) with a randomized response mechanism, thereby guaranteeing privacy protection while simultaneously estimating the target quantile and its variance. A strong Gaussian approximation for the proposed estimator yields asymptotically anytime-valid confidence sequences whose widths obey the law of the iterated logarithm (LIL). Moreover, the method is fully online, offering high computational efficiency and requiring only O(ฮบ)memory, where ฮบdenotes the number of chains and is much smaller than the sample size. Rigorous mathematical proofs and extensive numerical experiments demonstrate the theoretical soundness and practical effectiveness of the algorithm.


ExactPrivacyGuaranteesforMarkovChain ImplementationsoftheExponentialMechanismwith ArtificialAtoms

Neural Information Processing Systems

Existing work has examined these effects asymptotically, but implementable finite sample results are needed in practice so that users can specify privacy budgets in advance and implement samplers with exact privacy guarantees.


Data Augmentation MCMC for Bayesian Inference from Privatized Data

Neural Information Processing Systems

Differentially private mechanisms protect privacy by introducing additional randomness into the data. Restricting access to only the privatized data makes it challenging to perform valid statistical inference on parameters underlying the confidential data. Specifically, the likelihood function of the privatized data requires integrating over the large space of confidential databases and is typically intractable. For Bayesian analysis, this results in a posterior distribution that is doubly intractable, rendering traditional MCMC techniques inapplicable. We propose an MCMC framework to perform Bayesian inference from the privatized data, which is applicable to a wide range of statistical models and privacy mechanisms.




Privacy Meets Explainability: Managing Confidential Data and Transparency Policies in LLM-Empowered Science

arXiv.org Artificial Intelligence

As Large Language Models (LLMs) become integral to scientific workflows, concerns over the confidentiality and ethical handling of confidential data have emerged. This paper explores data exposure risks through LLM-powered scientific tools, which can inadvertently leak confidential information, including intellectual property and proprietary data, from scientists' perspectives. We propose "DataShield", a framework designed to detect confidential data leaks, summarize privacy policies, and visualize data flow, ensuring alignment with organizational policies and procedures. Our approach aims to inform scientists about data handling practices, enabling them to make informed decisions and protect sensitive information. Ongoing user studies with scientists are underway to evaluate the framework's usability, trustworthiness, and effectiveness in tackling real-world privacy challenges.


Data Augmentation MCMC for Bayesian Inference from Privatized Data

Neural Information Processing Systems

Differentially private mechanisms protect privacy by introducing additional randomness into the data. Restricting access to only the privatized data makes it challenging to perform valid statistical inference on parameters underlying the confidential data. Specifically, the likelihood function of the privatized data requires integrating over the large space of confidential databases and is typically intractable. For Bayesian analysis, this results in a posterior distribution that is doubly intractable, rendering traditional MCMC techniques inapplicable. We propose an MCMC framework to perform Bayesian inference from the privatized data, which is applicable to a wide range of statistical models and privacy mechanisms.


A hacking group reportedly leaked confidential data from thousands of Disney Slack channels.

Engadget

A hacking group leaked over a terabyte of confidential data from more than 10,000 Slack channels belonging to Disney, the Wall Street Journal reported on Monday. The leaked information includes discussions about ad campaigns, computer code, details about unreleased projects and discussion about interview candidates among other things. "Disney is investigating this matter," a company spokesperson told the Journal. Nullbulge calls itself a hacktivist group advocating for the rights of artists. A spokesperson for the group told the Journal that it targeted Disney due to concerns about the company's handling of artist contracts and its approach to generative AI.


Conditional Density Estimations from Privacy-Protected Data

arXiv.org Machine Learning

Many modern statistical analysis and machine learning applications require training models on sensitive user data. Differential privacy provides a formal guarantee that individual-level information about users does not leak. In this framework, randomized algorithms inject calibrated noise into the confidential data, resulting in privacy-protected datasets or queries. However, restricting access to only privatized data during statistical analysis makes it computationally challenging to make valid inferences on the parameters underlying the confidential data. In this work, we propose simulation-based inference methods from privacy-protected datasets. In addition to sequential Monte Carlo approximate Bayesian computation, we use neural conditional density estimators as a flexible family of distributions to approximate the posterior distribution of model parameters given the observed private query results. We illustrate our methods on discrete time-series data under an infectious disease model and with ordinary linear regression models. Illustrating the privacy-utility trade-off, our experiments and analysis demonstrate the necessity and feasibility of designing valid statistical inference procedures to correct for biases introduced by the privacy-protection mechanisms.


Confidential computing provides revolutionary data encryption, UC Berkeley professor says

#artificialintelligence

To further strengthen our commitment to providing industry-leading coverage of data technology, VentureBeat is excited to welcome Andrew Brust and Tony Baer as regular contributors. Confidential computing focuses on potentially revolutionary technology, in terms of impact on data security. In confidential computing, data remains encrypted, not just at rest and in transit, but also in use, allowing analytics and machine learning (ML) to be performed on the data, while maintaining its confidentiality. The capability to encrypt data in use opens up a massive range of possible real-world scenarios, and it has major implications and potential benefits for the future of data security. VentureBeat spoke with Raluca Ada Popa about her research and work in developing practical solutions for confidential computing.